The first two chunks of this r markdown file after the r setup allow for plot zooming, but it also means that the html file must be opened in a browser to view the document properly. When it knits in RStudio the preview will appear empty but the html when opened in a browser will have all the info and you can click on each plot to Zoom in on it.

Before you begin

Notes

If you have question please email the most recent author, currently

Marissa A. Dyck
Postdoctoral research fellow
University of Victoria
School of Environmental Studies
Email: marissadyck17@gmail.com

(update/add authors as needed)

R and RStudio

Before starting you should ensure you have the latest version of R and RStudio downloaded. This code was generated under R version 4.2.3 and with RStudio version 2024.04.2+764.

You can download R and RStudio HERE

R markdown

This script is written in R markdown and thus uses a mix of coding markup languages and R. If you are planning to run this script with new data or make any modifications you will want to be familiar with some basics of R markdown.

Below is an R markdown cheatsheet to help you get started,
R markdown cheatsheet

Install packages

If you don’t already have the following packages installed, use the code below to install them. *NOTE this will not run automatically as eval=FALSE is included in the chunk setup (i.e. I don’t want it to run every time I run this code since I have the packages installed)

install.packages('tidyverse')
install.packages('PerformanceAnalytics')
install.packages('Hmisc')

Load libraries

Then load the packages to your library.

library(tidyverse) # data tidying, visualization, and much more; this will load all tidyverse packages, can see complete list using tidyverse_packages()
library(PerformanceAnalytics)    #Used to generate a correlation plot
library(Hmisc) # used to generate histograms for all variables in data frame

Covariate data

Import covariate data

We have three data files that represent possible covariates for the analysis and we will import all of them at once here.

  1. SRFN_HFI.csv which contains human footprint inventory (anthropogenic disturbances) on the landscape from ABMI’s Wall-to-Wall Human Footprint Inventory - Year 2021

  2. SRFN_landscape.csv which contains landcover inventory (landcover types) on the landscape from ABMI’s Wall-to-Wall landcover Inventory - Year 2010

  3. SRFN_harvest.csv which contains proportional harvest per year? from the same source as the HFI data, but we extracted this after-the-fact to get info on the years harvested which wasn’t in our original download so we will have to add it back to the data

# these data files have a similar format so we can read them in together using the map() function in the purrr package

srfn_covariate_data <-    
  # provide file path (e.g. folders to find the data)
  file.path('data/raw',
            
            # provide the file names
            c('SRFN_HFI.csv',
              'SRFN_landcover.csv',
              'SRFN_harvest.csv')) %>%
  
  # use purrr map to read in files, the ~.x is a placeholder that refers to the object before the last pipe (aka the list of data we are reading in) so all functions inside the map() after ~.x will be performed on all the objects in the list we provided
  map(~.x %>%
        read_csv(.,
                 
                 # specify how to read in the various columns
                 col_types = cols(Site = col_factor(),
                                  BUFF_DIST = col_integer(),
                                  .default = col_number())) %>%
        
        # rename site column to site_number fo accuracy and joining data later
        rename(site_number = Site) %>% 
        
        
        # set the column names to lower case which makes it easier to reference them later so we don't have to type in all caps
        set_names(
          names(.) %>% 
            tolower()) %>% 
  
    # Reorder columns: site_number, buff_dist, then the rest alphabetically
        select(site_number, buff_dist, sort(setdiff(names(.), c('site_number', 'buff_dist'))))) %>%
          
           
  # set the names of the two files in the list, if you don't run this they will be named numerically (e.g. [1], [2]) which can get confusing
  purrr::set_names('HFI',
                   'VEG',
                   'harvest')

What we did above is create a list which contains three elements, the three dataframes we just read in, we did a bit of data tidying and then named each element (HFI, VEG, and harvest) so we can easily reference them from the list later

Data checks

Strucutre

Even though we set some of the columns to read in as a specific type in the data import step it’s always a good idea to check internal structure.

str(srfn_covariate_data)
## List of 3
##  $ HFI    : tibble [1,200 × 77] (S3: tbl_df/tbl/data.frame)
##   ..$ site_number                 : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ buff_dist                   : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##   ..$ airp-runway                 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ borrowpit-dry               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ borrowpit-wet               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ borrowpits                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ buffer_area                 : num [1:1200] 196260 196260 196260 196260 196260 ...
##   ..$ camp-industrial             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ campground                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ canal                       : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ cfo                         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ clearing-unknown            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ clearing-wellpad-unconfirmed: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ conventional-seismic        : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
##   ..$ country-residence           : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ crop                        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ cultivation_abandoned       : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ dugout                      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ facility-other              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ facility-unknown            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ fruit-vegetables            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ golfcourse                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ greenspace                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ grvl-sand-pit               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ harvest-area                : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
##   ..$ harvest-area-white-zone     : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ lagoon                      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ landfill                    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ low-impact-seismic          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ mill                        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ mines-pitlake               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ misc-oil-gas-facility       : num [1:1200] 0 0.131 0 0 0 ...
##   ..$ oil-gas-plant               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ open-pit-mine               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ pipeline                    : num [1:1200] 0 0.148 0.0148 0 0 ...
##   ..$ recreation                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ reservoir                   : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ residence_clearing          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rlwy-mlt-track              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rlwy-sgl-track              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rlwy-spur                   : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-gravel-1l              : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 0.00 ...
##   ..$ road-gravel-2l              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-1l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-2l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-3l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-4l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-div              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-undiv-1l         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-paved-undiv-2l         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-unclassified           : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-unimproved             : num [1:1200] 0 0 0 0 0.00675 ...
##   ..$ road-unpaved-2l             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road-winter                 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rough_pasture               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ runway                      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rural-residence             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ sump                        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ surrounding-veg             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ tame_pasture                : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ trail                       : num [1:1200] 0 0 0.011 0 0 ...
##   ..$ transfer_station            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ transmission-line           : num [1:1200] 0 0 0 0 0 ...
##   ..$ truck-trail                 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ urban-industrial            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ urban-residence             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ vegetated-edge-railways     : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ vegetated-edge-roads        : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
##   ..$ well_cleared_not_confirmed  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well_cleared_not_drilled    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well-aband                  : num [1:1200] 0 0 0 0 0 ...
##   ..$ well-bitumen                : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well-cased                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well-gas                    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well-oil                    : num [1:1200] 0 0 0 0 0.0332 ...
##   ..$ well-other                  : num [1:1200] 0 0 0.0183 0.0318 0 ...
##   ..$ well-unknown                : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VEG    : tibble [1,200 × 11] (S3: tbl_df/tbl/data.frame)
##   ..$ site_number: Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ buff_dist  : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##   ..$ 110        : num [1:1200] 0 0.3608 0.0618 0 0 ...
##   ..$ 120        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 20         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 210        : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
##   ..$ 220        : num [1:1200] 0 0.18 0 0 0 ...
##   ..$ 230        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 33         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 34         : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
##   ..$ 50         : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
##  $ harvest: tibble [1,200 × 63] (S3: tbl_df/tbl/data.frame)
##   ..$ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ buff_dist   : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##   ..$ 1940        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1950        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1960        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1966        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1967        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1968        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1969        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1970        : num [1:1200] 0 0.342 0 0.209 0 ...
##   ..$ 1971        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1972        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1973        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1974        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1975        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1976        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1977        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1978        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1979        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1980        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 1981        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1982        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1983        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1984        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1985        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1986        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1987        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1988        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1989        : num [1:1200] 0.0285 0 0 0 0 ...
##   ..$ 1990        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 1991        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1992        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1993        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1994        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1995        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1996        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1997        : num [1:1200] 0.0478 0 0 0 0 ...
##   ..$ 1998        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1999        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2000        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2001        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2002        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2003        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2004        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2005        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2006        : num [1:1200] 0 0 0 0.179 0.424 ...
##   ..$ 2007        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2008        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2009        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 2010        : num [1:1200] 0.355 0 0 0 0 ...
##   ..$ 2011        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2012        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2013        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2014        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2015        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 2016        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2017        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2018        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2019        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2020        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2021        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ buffer_area : num [1:1200] 196260 196260 196260 196260 196260 ...
##   ..$ feature_area: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...

From a quick glance everything looks good.

Sites

Now let’s check that all the sites are accounted for, there should be 53 based on the sites that had gps info in the GrizzleyRidge_camera file

# check that the sites are all there and entered correctly

# since the data sets are in a list we need to call the list first, then the data name in the list, then the column name
levels(srfn_covariate_data$HFI$site_number)
##  [1] "1"   "2"   "4"   "6"   "10"  "12"  "13"  "17"  "18"  "21"  "23"  "24" 
## [13] "26"  "30"  "31"  "35"  "37"  "38"  "39"  "42"  "44"  "45"  "46"  "51" 
## [25] "55"  "56"  "58"  "60"  "62"  "63"  "67"  "70"  "74"  "75"  "77"  "81" 
## [37] "83"  "88"  "94"  "95"  "99"  "100" "104" "105" "107" "110" "117" "118"
## [49] "119" "120" "121" "124" "125" "130" "131" "132" "136" "137" "140" "141"
levels(srfn_covariate_data$VEG$site_number)
##  [1] "1"   "2"   "4"   "6"   "10"  "12"  "13"  "17"  "18"  "21"  "23"  "24" 
## [13] "26"  "30"  "31"  "35"  "37"  "38"  "39"  "42"  "44"  "45"  "46"  "51" 
## [25] "55"  "56"  "58"  "60"  "62"  "63"  "67"  "70"  "74"  "75"  "77"  "81" 
## [37] "83"  "88"  "94"  "95"  "99"  "100" "104" "105" "107" "110" "117" "118"
## [49] "119" "120" "121" "124" "125" "130" "131" "132" "136" "137" "140" "141"
levels(srfn_covariate_data$harvest$site_number)
##  [1] "1"   "2"   "4"   "6"   "10"  "12"  "13"  "17"  "18"  "21"  "23"  "24" 
## [13] "26"  "30"  "31"  "35"  "37"  "38"  "39"  "42"  "44"  "45"  "46"  "51" 
## [25] "55"  "56"  "58"  "60"  "62"  "63"  "67"  "70"  "74"  "75"  "77"  "81" 
## [37] "83"  "88"  "94"  "95"  "99"  "100" "104" "105" "107" "110" "117" "118"
## [49] "119" "120" "121" "124" "125" "130" "131" "132" "136" "137" "140" "141"

We want to make sure the site names match with the camera data so let’s import the timelapse data from the last script (01_ACME_SRFN_camera….) to check

All the sites look like they match up

Column names

We should check that the column names all look good, there are a ton for the HFI data frame so we won’t look at each of the features individually but check that the general formatting/naming is okay

names(srfn_covariate_data$HFI)
##  [1] "site_number"                  "buff_dist"                   
##  [3] "airp-runway"                  "borrowpit-dry"               
##  [5] "borrowpit-wet"                "borrowpits"                  
##  [7] "buffer_area"                  "camp-industrial"             
##  [9] "campground"                   "canal"                       
## [11] "cfo"                          "clearing-unknown"            
## [13] "clearing-wellpad-unconfirmed" "conventional-seismic"        
## [15] "country-residence"            "crop"                        
## [17] "cultivation_abandoned"        "dugout"                      
## [19] "facility-other"               "facility-unknown"            
## [21] "fruit-vegetables"             "golfcourse"                  
## [23] "greenspace"                   "grvl-sand-pit"               
## [25] "harvest-area"                 "harvest-area-white-zone"     
## [27] "lagoon"                       "landfill"                    
## [29] "low-impact-seismic"           "mill"                        
## [31] "mines-pitlake"                "misc-oil-gas-facility"       
## [33] "oil-gas-plant"                "open-pit-mine"               
## [35] "pipeline"                     "recreation"                  
## [37] "reservoir"                    "residence_clearing"          
## [39] "rlwy-mlt-track"               "rlwy-sgl-track"              
## [41] "rlwy-spur"                    "road-gravel-1l"              
## [43] "road-gravel-2l"               "road-paved-1l"               
## [45] "road-paved-2l"                "road-paved-3l"               
## [47] "road-paved-4l"                "road-paved-div"              
## [49] "road-paved-undiv-1l"          "road-paved-undiv-2l"         
## [51] "road-unclassified"            "road-unimproved"             
## [53] "road-unpaved-2l"              "road-winter"                 
## [55] "rough_pasture"                "runway"                      
## [57] "rural-residence"              "sump"                        
## [59] "surrounding-veg"              "tame_pasture"                
## [61] "trail"                        "transfer_station"            
## [63] "transmission-line"            "truck-trail"                 
## [65] "urban-industrial"             "urban-residence"             
## [67] "vegetated-edge-railways"      "vegetated-edge-roads"        
## [69] "well_cleared_not_confirmed"   "well_cleared_not_drilled"    
## [71] "well-aband"                   "well-bitumen"                
## [73] "well-cased"                   "well-gas"                    
## [75] "well-oil"                     "well-other"                  
## [77] "well-unknown"

These look okay but we should replace the dash ‘-’ with and underscore ‘_’ to match formatting of other files and because it’s easier for R to work with. We will do this in a later step with any other issues because we don’t need it fixed now

We also want to add array and camera columns which we can do using the site data.

Let’s check the VEG data too

names(srfn_covariate_data$VEG)
##  [1] "site_number" "buff_dist"   "110"         "120"         "20"         
##  [6] "210"         "220"         "230"         "33"          "34"         
## [11] "50"

And finally the harvest data

names(srfn_covariate_data$harvest)
##  [1] "site_number"  "buff_dist"    "1940"         "1950"         "1960"        
##  [6] "1966"         "1967"         "1968"         "1969"         "1970"        
## [11] "1971"         "1972"         "1973"         "1974"         "1975"        
## [16] "1976"         "1977"         "1978"         "1979"         "1980"        
## [21] "1981"         "1982"         "1983"         "1984"         "1985"        
## [26] "1986"         "1987"         "1988"         "1989"         "1990"        
## [31] "1991"         "1992"         "1993"         "1994"         "1995"        
## [36] "1996"         "1997"         "1998"         "1999"         "2000"        
## [41] "2001"         "2002"         "2003"         "2004"         "2005"        
## [46] "2006"         "2007"         "2008"         "2009"         "2010"        
## [51] "2011"         "2012"         "2013"         "2014"         "2015"        
## [56] "2016"         "2017"         "2018"         "2019"         "2020"        
## [61] "2021"         "buffer_area"  "feature_area"

NAs

Let’s check the summary for any NAs that shouldn’t be in the data, mostly we are looking for NAs in the site_number or buff_dist columns

summary(srfn_covariate_data$HFI)
##   site_number     buff_dist     airp-runway borrowpit-dry      
##  1      :  20   Min.   : 250   Min.   :0    Min.   :0.0000000  
##  2      :  20   1st Qu.:1438   1st Qu.:0    1st Qu.:0.0000000  
##  4      :  20   Median :2625   Median :0    Median :0.0000000  
##  6      :  20   Mean   :2625   Mean   :0    Mean   :0.0004945  
##  10     :  20   3rd Qu.:3812   3rd Qu.:0    3rd Qu.:0.0005115  
##  12     :  20   Max.   :5000   Max.   :0    Max.   :0.0296372  
##  (Other):1080                                                  
##  borrowpit-wet         borrowpits         buffer_area       camp-industrial
##  Min.   :0.0000000   Min.   :0.000e+00   Min.   :  196260   Min.   :0      
##  1st Qu.:0.0000000   1st Qu.:0.000e+00   1st Qu.: 6525640   1st Qu.:0      
##  Median :0.0000000   Median :0.000e+00   Median :21686712   Median :0      
##  Mean   :0.0001462   Mean   :4.598e-05   Mean   :28163286   Mean   :0      
##  3rd Qu.:0.0000760   3rd Qu.:0.000e+00   3rd Qu.:45679477   3rd Qu.:0      
##  Max.   :0.0073210   Max.   :2.615e-03   Max.   :78503934   Max.   :0      
##                                                                            
##    campground            canal                cfo    clearing-unknown   
##  Min.   :0.000e+00   Min.   :0.0000000   Min.   :0   Min.   :0.000e+00  
##  1st Qu.:0.000e+00   1st Qu.:0.0000000   1st Qu.:0   1st Qu.:0.000e+00  
##  Median :0.000e+00   Median :0.0000000   Median :0   Median :4.240e-07  
##  Mean   :2.409e-06   Mean   :0.0002124   Mean   :0   Mean   :8.076e-04  
##  3rd Qu.:0.000e+00   3rd Qu.:0.0000000   3rd Qu.:0   3rd Qu.:1.124e-03  
##  Max.   :4.967e-04   Max.   :0.0076994   Max.   :0   Max.   :2.818e-02  
##                                                                         
##  clearing-wellpad-unconfirmed conventional-seismic country-residence 
##  Min.   :0.0000000            Min.   :0.000000     Min.   :0.000000  
##  1st Qu.:0.0000000            1st Qu.:0.001827     1st Qu.:0.000000  
##  Median :0.0000000            Median :0.003612     Median :0.000000  
##  Mean   :0.0001149            Mean   :0.004028     Mean   :0.000439  
##  3rd Qu.:0.0000000            3rd Qu.:0.005451     3rd Qu.:0.000000  
##  Max.   :0.0027152            Max.   :0.030028     Max.   :0.056385  
##                                                                      
##       crop         cultivation_abandoned     dugout         
##  Min.   :0.00000   Min.   :0.000000      Min.   :0.000e+00  
##  1st Qu.:0.00000   1st Qu.:0.000000      1st Qu.:0.000e+00  
##  Median :0.00000   Median :0.000000      Median :0.000e+00  
##  Mean   :0.02988   Mean   :0.001701      Mean   :2.309e-05  
##  3rd Qu.:0.00000   3rd Qu.:0.000000      3rd Qu.:0.000e+00  
##  Max.   :0.43283   Max.   :0.040084      Max.   :1.239e-03  
##                                                             
##  facility-other      facility-unknown    fruit-vegetables   golfcourse
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0        Min.   :0   
##  1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0        1st Qu.:0   
##  Median :0.0000000   Median :0.0000000   Median :0        Median :0   
##  Mean   :0.0003137   Mean   :0.0000223   Mean   :0        Mean   :0   
##  3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0        3rd Qu.:0   
##  Max.   :0.0774805   Max.   :0.0064178   Max.   :0        Max.   :0   
##                                                                       
##    greenspace        grvl-sand-pit       harvest-area    
##  Min.   :0.000e+00   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000e+00   1st Qu.:0.000000   1st Qu.:0.02588  
##  Median :0.000e+00   Median :0.000000   Median :0.23866  
##  Mean   :1.424e-05   Mean   :0.001116   Mean   :0.23873  
##  3rd Qu.:0.000e+00   3rd Qu.:0.000000   3rd Qu.:0.37536  
##  Max.   :2.346e-03   Max.   :0.416663   Max.   :0.98631  
##                                                          
##  harvest-area-white-zone     lagoon             landfill low-impact-seismic 
##  Min.   :0.00000         Min.   :0.000e+00   Min.   :0   Min.   :0.000e+00  
##  1st Qu.:0.00000         1st Qu.:0.000e+00   1st Qu.:0   1st Qu.:0.000e+00  
##  Median :0.00000         Median :0.000e+00   Median :0   Median :0.000e+00  
##  Mean   :0.01302         Mean   :3.106e-05   Mean   :0   Mean   :1.828e-05  
##  3rd Qu.:0.00000         3rd Qu.:0.000e+00   3rd Qu.:0   3rd Qu.:0.000e+00  
##  Max.   :0.80503         Max.   :4.257e-03   Max.   :0   Max.   :6.059e-03  
##                                                                             
##       mill   mines-pitlake misc-oil-gas-facility oil-gas-plant open-pit-mine
##  Min.   :0   Min.   :0     Min.   :0.0000000     Min.   :0     Min.   :0    
##  1st Qu.:0   1st Qu.:0     1st Qu.:0.0000000     1st Qu.:0     1st Qu.:0    
##  Median :0   Median :0     Median :0.0000000     Median :0     Median :0    
##  Mean   :0   Mean   :0     Mean   :0.0013619     Mean   :0     Mean   :0    
##  3rd Qu.:0   3rd Qu.:0     3rd Qu.:0.0007224     3rd Qu.:0     3rd Qu.:0    
##  Max.   :0   Max.   :0     Max.   :0.1313891     Max.   :0     Max.   :0    
##                                                                             
##     pipeline         recreation          reservoir         residence_clearing 
##  Min.   :0.00000   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:0.00000   1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:0.0000000  
##  Median :0.00450   Median :0.000e+00   Median :0.000e+00   Median :0.0000000  
##  Mean   :0.01031   Mean   :6.623e-05   Mean   :8.539e-05   Mean   :0.0001049  
##  3rd Qu.:0.01523   3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.:0.0000000  
##  Max.   :0.14867   Max.   :7.941e-03   Max.   :1.393e-02   Max.   :0.0132461  
##                                                                               
##  rlwy-mlt-track rlwy-sgl-track        rlwy-spur road-gravel-1l     
##  Min.   :0      Min.   :0.0000000   Min.   :0   Min.   :0.0000000  
##  1st Qu.:0      1st Qu.:0.0000000   1st Qu.:0   1st Qu.:0.0006477  
##  Median :0      Median :0.0000000   Median :0   Median :0.0043887  
##  Mean   :0      Mean   :0.0001036   Mean   :0   Mean   :0.0056608  
##  3rd Qu.:0      3rd Qu.:0.0000000   3rd Qu.:0   3rd Qu.:0.0088573  
##  Max.   :0      Max.   :0.0036376   Max.   :0   Max.   :0.0598752  
##                                                                    
##  road-gravel-2l      road-paved-1l       road-paved-2l road-paved-3l
##  Min.   :0.000e+00   Min.   :0.000e+00   Min.   :0     Min.   :0    
##  1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:0     1st Qu.:0    
##  Median :0.000e+00   Median :0.000e+00   Median :0     Median :0    
##  Mean   :3.886e-05   Mean   :5.918e-06   Mean   :0     Mean   :0    
##  3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.:0     3rd Qu.:0    
##  Max.   :1.820e-03   Max.   :6.158e-04   Max.   :0     Max.   :0    
##                                                                     
##  road-paved-4l road-paved-div road-paved-undiv-1l road-paved-undiv-2l
##  Min.   :0     Min.   :0      Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:0     1st Qu.:0      1st Qu.:0.000e+00   1st Qu.:0.0000000  
##  Median :0     Median :0      Median :0.000e+00   Median :0.0000000  
##  Mean   :0     Mean   :0      Mean   :7.538e-06   Mean   :0.0005671  
##  3rd Qu.:0     3rd Qu.:0      3rd Qu.:0.000e+00   3rd Qu.:0.0000000  
##  Max.   :0     Max.   :0      Max.   :1.051e-03   Max.   :0.0066563  
##                                                                      
##  road-unclassified   road-unimproved     road-unpaved-2l  road-winter
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0       Min.   :0   
##  1st Qu.:0.0000000   1st Qu.:0.0001997   1st Qu.:0       1st Qu.:0   
##  Median :0.0000000   Median :0.0009214   Median :0       Median :0   
##  Mean   :0.0001274   Mean   :0.0011036   Mean   :0       Mean   :0   
##  3rd Qu.:0.0000000   3rd Qu.:0.0014730   3rd Qu.:0       3rd Qu.:0   
##  Max.   :0.0145510   Max.   :0.0237365   Max.   :0       Max.   :0   
##                                                                      
##  rough_pasture         runway          rural-residence         sump          
##  Min.   :0.00000   Min.   :0.0000000   Min.   :0.000000   Min.   :0.000e+00  
##  1st Qu.:0.00000   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.000e+00  
##  Median :0.00000   Median :0.0000000   Median :0.000000   Median :0.000e+00  
##  Mean   :0.01066   Mean   :0.0001223   Mean   :0.001884   Mean   :4.982e-05  
##  3rd Qu.:0.00000   3rd Qu.:0.0000000   3rd Qu.:0.000000   3rd Qu.:0.000e+00  
##  Max.   :0.28616   Max.   :0.0123446   Max.   :0.091914   Max.   :3.232e-03  
##                                                                              
##  surrounding-veg      tame_pasture        trail           transfer_station
##  Min.   :0.0000000   Min.   :0.0000   Min.   :0.0000000   Min.   :0       
##  1st Qu.:0.0000000   1st Qu.:0.0000   1st Qu.:0.0003476   1st Qu.:0       
##  Median :0.0000000   Median :0.0000   Median :0.0009790   Median :0       
##  Mean   :0.0001282   Mean   :0.0146   Mean   :0.0012570   Mean   :0       
##  3rd Qu.:0.0000000   3rd Qu.:0.0000   3rd Qu.:0.0018804   3rd Qu.:0       
##  Max.   :0.0346612   Max.   :0.2991   Max.   :0.0118693   Max.   :0       
##                                                                           
##  transmission-line    truck-trail        urban-industrial   
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0.000e+00  
##  1st Qu.:0.0000000   1st Qu.:0.0001198   1st Qu.:0.000e+00  
##  Median :0.0000000   Median :0.0006074   Median :0.000e+00  
##  Mean   :0.0011787   Mean   :0.0011931   Mean   :2.583e-05  
##  3rd Qu.:0.0003164   3rd Qu.:0.0015813   3rd Qu.:0.000e+00  
##  Max.   :0.0460439   Max.   :0.0823490   Max.   :4.045e-03  
##                                                             
##  urban-residence     vegetated-edge-railways vegetated-edge-roads
##  Min.   :0.0000000   Min.   :0.0000000       Min.   :0.000000    
##  1st Qu.:0.0000000   1st Qu.:0.0000000       1st Qu.:0.003433    
##  Median :0.0000000   Median :0.0000000       Median :0.012251    
##  Mean   :0.0001453   Mean   :0.0001874       Mean   :0.013592    
##  3rd Qu.:0.0000000   3rd Qu.:0.0000000       3rd Qu.:0.020822    
##  Max.   :0.0191791   Max.   :0.0049635       Max.   :0.099551    
##                                                                  
##  well_cleared_not_confirmed well_cleared_not_drilled   well-aband       
##  Min.   :0                  Min.   :0                Min.   :0.0000000  
##  1st Qu.:0                  1st Qu.:0                1st Qu.:0.0001567  
##  Median :0                  Median :0                Median :0.0019988  
##  Mean   :0                  Mean   :0                Mean   :0.0031103  
##  3rd Qu.:0                  3rd Qu.:0                3rd Qu.:0.0045361  
##  Max.   :0                  Max.   :0                Max.   :0.0437908  
##                                                                         
##   well-bitumen   well-cased           well-gas            well-oil       
##  Min.   :0     Min.   :0.000e+00   Min.   :0.000e+00   Min.   :0.000000  
##  1st Qu.:0     1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:0.000000  
##  Median :0     Median :0.000e+00   Median :0.000e+00   Median :0.003253  
##  Mean   :0     Mean   :7.166e-05   Mean   :5.196e-05   Mean   :0.005327  
##  3rd Qu.:0     3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.:0.009214  
##  Max.   :0     Max.   :3.125e-03   Max.   :1.857e-03   Max.   :0.095784  
##                                                                          
##    well-other        well-unknown
##  Min.   :0.000000   Min.   :0    
##  1st Qu.:0.000000   1st Qu.:0    
##  Median :0.000000   Median :0    
##  Mean   :0.001677   Mean   :0    
##  3rd Qu.:0.002377   3rd Qu.:0    
##  Max.   :0.032438   Max.   :0    
## 
summary(srfn_covariate_data$VEG)
##   site_number     buff_dist         110                120         
##  1      :  20   Min.   : 250   Min.   :0.000000   Min.   :0.00000  
##  2      :  20   1st Qu.:1438   1st Qu.:0.006635   1st Qu.:0.00000  
##  4      :  20   Median :2625   Median :0.034291   Median :0.00000  
##  6      :  20   Mean   :2625   Mean   :0.055123   Mean   :0.03587  
##  10     :  20   3rd Qu.:3812   3rd Qu.:0.068804   3rd Qu.:0.00000  
##  12     :  20   Max.   :5000   Max.   :0.883334   Max.   :0.49000  
##  (Other):1080                                                      
##        20               210               220              230         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.03179   1st Qu.:0.1463   1st Qu.:0.00000  
##  Median :0.00000   Median :0.23137   Median :0.3010   Median :0.01965  
##  Mean   :0.06146   Mean   :0.23902   Mean   :0.3502   Mean   :0.04277  
##  3rd Qu.:0.03622   3rd Qu.:0.38303   3rd Qu.:0.5250   3rd Qu.:0.06350  
##  Max.   :0.84113   Max.   :0.84699   Max.   :1.0000   Max.   :0.93137  
##                                                                        
##        33                  34                50         
##  Min.   :0.000e+00   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.000e+00   1st Qu.:0.01782   1st Qu.:0.04624  
##  Median :0.000e+00   Median :0.05463   Median :0.10172  
##  Mean   :4.182e-05   Mean   :0.05948   Mean   :0.15602  
##  3rd Qu.:0.000e+00   3rd Qu.:0.08856   3rd Qu.:0.20080  
##  Max.   :3.641e-03   Max.   :0.45140   Max.   :0.93212  
## 
summary(srfn_covariate_data$harvest)
##   site_number     buff_dist         1940                1950         
##  1      :  20   Min.   : 250   Min.   :0.0000000   Min.   :0.000000  
##  2      :  20   1st Qu.:1438   1st Qu.:0.0000000   1st Qu.:0.000000  
##  4      :  20   Median :2625   Median :0.0000000   Median :0.000000  
##  6      :  20   Mean   :2625   Mean   :0.0002878   Mean   :0.005077  
##  10     :  20   3rd Qu.:3812   3rd Qu.:0.0000000   3rd Qu.:0.000000  
##  12     :  20   Max.   :5000   Max.   :0.0243877   Max.   :0.891286  
##  (Other):1080                                                        
##       1960               1966        1967                1968          
##  Min.   :0.000000   Min.   :0   Min.   :0.0000000   Min.   :0.0000000  
##  1st Qu.:0.000000   1st Qu.:0   1st Qu.:0.0000000   1st Qu.:0.0000000  
##  Median :0.000000   Median :0   Median :0.0000000   Median :0.0000000  
##  Mean   :0.004843   Mean   :0   Mean   :0.0001106   Mean   :0.0001209  
##  3rd Qu.:0.001549   3rd Qu.:0   3rd Qu.:0.0000000   3rd Qu.:0.0000000  
##  Max.   :0.125229   Max.   :0   Max.   :0.0150943   Max.   :0.0135532  
##                                                                        
##       1969                1970              1971        1972        1973  
##  Min.   :0.000e+00   Min.   :0.00000   Min.   :0   Min.   :0   Min.   :0  
##  1st Qu.:0.000e+00   1st Qu.:0.00000   1st Qu.:0   1st Qu.:0   1st Qu.:0  
##  Median :0.000e+00   Median :0.00000   Median :0   Median :0   Median :0  
##  Mean   :1.067e-05   Mean   :0.02033   Mean   :0   Mean   :0   Mean   :0  
##  3rd Qu.:0.000e+00   3rd Qu.:0.01921   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0  
##  Max.   :1.691e-03   Max.   :0.87957   Max.   :0   Max.   :0   Max.   :0  
##                                                                           
##       1974        1975                1976                1977          
##  Min.   :0   Min.   :0.000e+00   Min.   :0.0000000   Min.   :0.000e+00  
##  1st Qu.:0   1st Qu.:0.000e+00   1st Qu.:0.0000000   1st Qu.:0.000e+00  
##  Median :0   Median :0.000e+00   Median :0.0000000   Median :0.000e+00  
##  Mean   :0   Mean   :5.914e-06   Mean   :0.0000021   Mean   :3.272e-07  
##  3rd Qu.:0   3rd Qu.:0.000e+00   3rd Qu.:0.0000000   3rd Qu.:0.000e+00  
##  Max.   :0   Max.   :1.430e-03   Max.   :0.0007532   Max.   :2.599e-04  
##                                                                         
##       1978        1979        1980               1981        1982  
##  Min.   :0   Min.   :0   Min.   :0.000000   Min.   :0   Min.   :0  
##  1st Qu.:0   1st Qu.:0   1st Qu.:0.000000   1st Qu.:0   1st Qu.:0  
##  Median :0   Median :0   Median :0.000000   Median :0   Median :0  
##  Mean   :0   Mean   :0   Mean   :0.017130   Mean   :0   Mean   :0  
##  3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.003411   3rd Qu.:0   3rd Qu.:0  
##  Max.   :0   Max.   :0   Max.   :0.420122   Max.   :0   Max.   :0  
##                                                                    
##       1983               1984                1985               1986         
##  Min.   :0.000000   Min.   :0.000e+00   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000e+00   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000e+00   Median :0.000000   Median :0.000000  
##  Mean   :0.000197   Mean   :5.556e-05   Mean   :0.001827   Mean   :0.007432  
##  3rd Qu.:0.000000   3rd Qu.:0.000e+00   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :0.011415   Max.   :7.692e-03   Max.   :0.087543   Max.   :0.197918  
##                                                                              
##       1987                1988               1989               1990        
##  Min.   :0.0000000   Min.   :0.000000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.00000  
##  Median :0.0000000   Median :0.000000   Median :0.000000   Median :0.00000  
##  Mean   :0.0003432   Mean   :0.001416   Mean   :0.002745   Mean   :0.02199  
##  3rd Qu.:0.0000000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.00939  
##  Max.   :0.0449292   Max.   :0.171834   Max.   :0.173129   Max.   :0.84354  
##                                                                             
##       1991        1992        1993                1994          
##  Min.   :0   Min.   :0   Min.   :0.0000000   Min.   :0.0000000  
##  1st Qu.:0   1st Qu.:0   1st Qu.:0.0000000   1st Qu.:0.0000000  
##  Median :0   Median :0   Median :0.0000000   Median :0.0000000  
##  Mean   :0   Mean   :0   Mean   :0.0002048   Mean   :0.0007679  
##  3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.0000000   3rd Qu.:0.0000000  
##  Max.   :0   Max.   :0   Max.   :0.0205565   Max.   :0.0779967  
##                                                                 
##       1995                1996               1997               1998         
##  Min.   :0.000e+00   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000e+00   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000e+00   Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :6.971e-05   Mean   :0.007337   Mean   :0.001736   Mean   :0.001915  
##  3rd Qu.:0.000e+00   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :6.484e-03   Max.   :0.788790   Max.   :0.126973   Max.   :0.108919  
##                                                                              
##       1999                2000               2001                2002  
##  Min.   :0.0000000   Min.   :0.000000   Min.   :0.000e+00   Min.   :0  
##  1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.000e+00   1st Qu.:0  
##  Median :0.0000000   Median :0.000000   Median :0.000e+00   Median :0  
##  Mean   :0.0004213   Mean   :0.007223   Mean   :7.072e-05   Mean   :0  
##  3rd Qu.:0.0000000   3rd Qu.:0.000000   3rd Qu.:0.000e+00   3rd Qu.:0  
##  Max.   :0.0388934   Max.   :0.393858   Max.   :8.372e-03   Max.   :0  
##                                                                        
##       2003               2004                2005               2006        
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.00000  
##  Median :0.000000   Median :0.0000000   Median :0.000000   Median :0.00000  
##  Mean   :0.008926   Mean   :0.0043836   Mean   :0.002526   Mean   :0.01975  
##  3rd Qu.:0.000000   3rd Qu.:0.0001052   3rd Qu.:0.000000   3rd Qu.:0.01886  
##  Max.   :0.280990   Max.   :0.0906410   Max.   :0.244374   Max.   :0.42386  
##                                                                             
##       2007                2008              2009              2010         
##  Min.   :0.0000000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.0000000   Median :0.00000   Median :0.00000   Median :0.000000  
##  Mean   :0.0003268   Mean   :0.00844   Mean   :0.01501   Mean   :0.009539  
##  3rd Qu.:0.0000000   3rd Qu.:0.00000   3rd Qu.:0.01630   3rd Qu.:0.000000  
##  Max.   :0.0326652   Max.   :0.49764   Max.   :0.37049   Max.   :0.478107  
##                                                                            
##       2011               2012               2013               2014          
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000   Min.   :0.000e+00  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000e+00  
##  Median :0.000000   Median :0.000000   Median :0.000000   Median :0.000e+00  
##  Mean   :0.008219   Mean   :0.002717   Mean   :0.004286   Mean   :8.902e-05  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000e+00  
##  Max.   :0.237154   Max.   :0.103264   Max.   :0.289583   Max.   :4.485e-03  
##                                                                              
##       2015                2016               2017              2018         
##  Min.   :0.0000000   Min.   :0.000000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.0000000   Median :0.000000   Median :0.00000   Median :0.000000  
##  Mean   :0.0130128   Mean   :0.000608   Mean   :0.01409   Mean   :0.003066  
##  3rd Qu.:0.0006807   3rd Qu.:0.000000   3rd Qu.:0.01322   3rd Qu.:0.000000  
##  Max.   :0.4669166   Max.   :0.037603   Max.   :0.19362   Max.   :0.359185  
##                                                                             
##       2019               2020              2021           buffer_area      
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.000000   Min.   :  196260  
##  1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.: 6525640  
##  Median :0.000000   Median :0.00000   Median :0.000000   Median :21686712  
##  Mean   :0.006002   Mean   :0.00565   Mean   :0.008415   Mean   :28163286  
##  3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.000000   3rd Qu.:45679477  
##  Max.   :0.186836   Max.   :0.11843   Max.   :0.459053   Max.   :78503934  
##                                                                            
##   feature_area
##  Min.   :0    
##  1st Qu.:0    
##  Median :0    
##  Mean   :0    
##  3rd Qu.:0    
##  Max.   :0    
## 

Everything looks good!

Data formatting

As with the previous sections this section will likely change each year but offers a good starting point, and I do all the data manipulation in one code chunk but run each portion individually as I build the chunk to make sure it’s working.

This code will do the following data formatting on all files simultaneously using purrr::map

  1. Change the column names - replace dashes with underscores
  • no additional steps yet
 srfn_covariate_data_fixed <- srfn_covariate_data %>% 
  
  map(
    ~.x %>% 
      
      set_names(
        names(.) %>% 
          
          # replace the '-' with '_' in the feature column names
          str_replace_all(pattern = '-', # provide the character pattern to look for (if you don't keep the \\ it won't work)
                          replacement = '_')))

Now let’s recheck the data, data structure, and the site_numbers with the deployment data, you can run each of these individually or all at once and review each one

# check structure of variables
str(srfn_covariate_data_fixed)
## List of 3
##  $ HFI    : tibble [1,200 × 77] (S3: tbl_df/tbl/data.frame)
##   ..$ site_number                 : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ buff_dist                   : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##   ..$ airp_runway                 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ borrowpit_dry               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ borrowpit_wet               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ borrowpits                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ buffer_area                 : num [1:1200] 196260 196260 196260 196260 196260 ...
##   ..$ camp_industrial             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ campground                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ canal                       : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ cfo                         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ clearing_unknown            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ clearing_wellpad_unconfirmed: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ conventional_seismic        : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
##   ..$ country_residence           : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ crop                        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ cultivation_abandoned       : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ dugout                      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ facility_other              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ facility_unknown            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ fruit_vegetables            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ golfcourse                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ greenspace                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ grvl_sand_pit               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ harvest_area                : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
##   ..$ harvest_area_white_zone     : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ lagoon                      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ landfill                    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ low_impact_seismic          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ mill                        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ mines_pitlake               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ misc_oil_gas_facility       : num [1:1200] 0 0.131 0 0 0 ...
##   ..$ oil_gas_plant               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ open_pit_mine               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ pipeline                    : num [1:1200] 0 0.148 0.0148 0 0 ...
##   ..$ recreation                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ reservoir                   : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ residence_clearing          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rlwy_mlt_track              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rlwy_sgl_track              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rlwy_spur                   : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_gravel_1l              : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 0.00 ...
##   ..$ road_gravel_2l              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_1l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_2l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_3l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_4l               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_div              : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_undiv_1l         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_paved_undiv_2l         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_unclassified           : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_unimproved             : num [1:1200] 0 0 0 0 0.00675 ...
##   ..$ road_unpaved_2l             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ road_winter                 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rough_pasture               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ runway                      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ rural_residence             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ sump                        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ surrounding_veg             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ tame_pasture                : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ trail                       : num [1:1200] 0 0 0.011 0 0 ...
##   ..$ transfer_station            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ transmission_line           : num [1:1200] 0 0 0 0 0 ...
##   ..$ truck_trail                 : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ urban_industrial            : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ urban_residence             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ vegetated_edge_railways     : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ vegetated_edge_roads        : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
##   ..$ well_cleared_not_confirmed  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well_cleared_not_drilled    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well_aband                  : num [1:1200] 0 0 0 0 0 ...
##   ..$ well_bitumen                : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well_cased                  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well_gas                    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ well_oil                    : num [1:1200] 0 0 0 0 0.0332 ...
##   ..$ well_other                  : num [1:1200] 0 0 0.0183 0.0318 0 ...
##   ..$ well_unknown                : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ VEG    : tibble [1,200 × 11] (S3: tbl_df/tbl/data.frame)
##   ..$ site_number: Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ buff_dist  : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##   ..$ 110        : num [1:1200] 0 0.3608 0.0618 0 0 ...
##   ..$ 120        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 20         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 210        : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
##   ..$ 220        : num [1:1200] 0 0.18 0 0 0 ...
##   ..$ 230        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 33         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 34         : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
##   ..$ 50         : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
##  $ harvest: tibble [1,200 × 63] (S3: tbl_df/tbl/data.frame)
##   ..$ site_number : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##   ..$ buff_dist   : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##   ..$ 1940        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1950        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1960        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1966        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1967        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1968        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1969        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1970        : num [1:1200] 0 0.342 0 0.209 0 ...
##   ..$ 1971        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1972        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1973        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1974        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1975        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1976        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1977        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1978        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1979        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1980        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 1981        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1982        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1983        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1984        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1985        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1986        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1987        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1988        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1989        : num [1:1200] 0.0285 0 0 0 0 ...
##   ..$ 1990        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 1991        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1992        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1993        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1994        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1995        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1996        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1997        : num [1:1200] 0.0478 0 0 0 0 ...
##   ..$ 1998        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 1999        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2000        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2001        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2002        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2003        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2004        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2005        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2006        : num [1:1200] 0 0 0 0.179 0.424 ...
##   ..$ 2007        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2008        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2009        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 2010        : num [1:1200] 0.355 0 0 0 0 ...
##   ..$ 2011        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2012        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2013        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2014        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2015        : num [1:1200] 0 0 0 0 0 ...
##   ..$ 2016        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2017        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2018        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2019        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2020        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ 2021        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##   ..$ buffer_area : num [1:1200] 196260 196260 196260 196260 196260 ...
##   ..$ feature_area: num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
# take a look at the column names
names(srfn_covariate_data_fixed$HFI)
##  [1] "site_number"                  "buff_dist"                   
##  [3] "airp_runway"                  "borrowpit_dry"               
##  [5] "borrowpit_wet"                "borrowpits"                  
##  [7] "buffer_area"                  "camp_industrial"             
##  [9] "campground"                   "canal"                       
## [11] "cfo"                          "clearing_unknown"            
## [13] "clearing_wellpad_unconfirmed" "conventional_seismic"        
## [15] "country_residence"            "crop"                        
## [17] "cultivation_abandoned"        "dugout"                      
## [19] "facility_other"               "facility_unknown"            
## [21] "fruit_vegetables"             "golfcourse"                  
## [23] "greenspace"                   "grvl_sand_pit"               
## [25] "harvest_area"                 "harvest_area_white_zone"     
## [27] "lagoon"                       "landfill"                    
## [29] "low_impact_seismic"           "mill"                        
## [31] "mines_pitlake"                "misc_oil_gas_facility"       
## [33] "oil_gas_plant"                "open_pit_mine"               
## [35] "pipeline"                     "recreation"                  
## [37] "reservoir"                    "residence_clearing"          
## [39] "rlwy_mlt_track"               "rlwy_sgl_track"              
## [41] "rlwy_spur"                    "road_gravel_1l"              
## [43] "road_gravel_2l"               "road_paved_1l"               
## [45] "road_paved_2l"                "road_paved_3l"               
## [47] "road_paved_4l"                "road_paved_div"              
## [49] "road_paved_undiv_1l"          "road_paved_undiv_2l"         
## [51] "road_unclassified"            "road_unimproved"             
## [53] "road_unpaved_2l"              "road_winter"                 
## [55] "rough_pasture"                "runway"                      
## [57] "rural_residence"              "sump"                        
## [59] "surrounding_veg"              "tame_pasture"                
## [61] "trail"                        "transfer_station"            
## [63] "transmission_line"            "truck_trail"                 
## [65] "urban_industrial"             "urban_residence"             
## [67] "vegetated_edge_railways"      "vegetated_edge_roads"        
## [69] "well_cleared_not_confirmed"   "well_cleared_not_drilled"    
## [71] "well_aband"                   "well_bitumen"                
## [73] "well_cased"                   "well_gas"                    
## [75] "well_oil"                     "well_other"                  
## [77] "well_unknown"
names(srfn_covariate_data_fixed$VEG)
##  [1] "site_number" "buff_dist"   "110"         "120"         "20"         
##  [6] "210"         "220"         "230"         "33"          "34"         
## [11] "50"
names(srfn_covariate_data_fixed$harvest)
##  [1] "site_number"  "buff_dist"    "1940"         "1950"         "1960"        
##  [6] "1966"         "1967"         "1968"         "1969"         "1970"        
## [11] "1971"         "1972"         "1973"         "1974"         "1975"        
## [16] "1976"         "1977"         "1978"         "1979"         "1980"        
## [21] "1981"         "1982"         "1983"         "1984"         "1985"        
## [26] "1986"         "1987"         "1988"         "1989"         "1990"        
## [31] "1991"         "1992"         "1993"         "1994"         "1995"        
## [36] "1996"         "1997"         "1998"         "1999"         "2000"        
## [41] "2001"         "2002"         "2003"         "2004"         "2005"        
## [46] "2006"         "2007"         "2008"         "2009"         "2010"        
## [51] "2011"         "2012"         "2013"         "2014"         "2015"        
## [56] "2016"         "2017"         "2018"         "2019"         "2020"        
## [61] "2021"         "buffer_area"  "feature_area"

Join covariate data

Now we need to join the three files together

covariates_all <- srfn_covariate_data_fixed$HFI %>% 
  
  #use full join in case any issues with missing observations but we should be good since we checked the site_number names
  full_join(srfn_covariate_data_fixed$VEG,
            by = c('site_number', 'buff_dist')) %>% 
  
  full_join(srfn_covariate_data_fixed$harvest,
            by = c('site_number', 'buff_dist')) 


head(covariates_all)
## # A tibble: 6 × 147
##   site_number buff_dist airp_runway borrowpit_dry borrowpit_wet borrowpits
##   <fct>           <int>       <dbl>         <dbl>         <dbl>      <dbl>
## 1 1                 250           0             0             0          0
## 2 2                 250           0             0             0          0
## 3 4                 250           0             0             0          0
## 4 6                 250           0             0             0          0
## 5 10                250           0             0             0          0
## 6 12                250           0             0             0          0
## # ℹ 141 more variables: buffer_area.x <dbl>, camp_industrial <dbl>,
## #   campground <dbl>, canal <dbl>, cfo <dbl>, clearing_unknown <dbl>,
## #   clearing_wellpad_unconfirmed <dbl>, conventional_seismic <dbl>,
## #   country_residence <dbl>, crop <dbl>, cultivation_abandoned <dbl>,
## #   dugout <dbl>, facility_other <dbl>, facility_unknown <dbl>,
## #   fruit_vegetables <dbl>, golfcourse <dbl>, greenspace <dbl>,
## #   grvl_sand_pit <dbl>, harvest_area <dbl>, harvest_area_white_zone <dbl>, …
summary(covariates_all)
##   site_number     buff_dist     airp_runway borrowpit_dry      
##  1      :  20   Min.   : 250   Min.   :0    Min.   :0.0000000  
##  2      :  20   1st Qu.:1438   1st Qu.:0    1st Qu.:0.0000000  
##  4      :  20   Median :2625   Median :0    Median :0.0000000  
##  6      :  20   Mean   :2625   Mean   :0    Mean   :0.0004945  
##  10     :  20   3rd Qu.:3812   3rd Qu.:0    3rd Qu.:0.0005115  
##  12     :  20   Max.   :5000   Max.   :0    Max.   :0.0296372  
##  (Other):1080                                                  
##  borrowpit_wet         borrowpits        buffer_area.x      camp_industrial
##  Min.   :0.0000000   Min.   :0.000e+00   Min.   :  196260   Min.   :0      
##  1st Qu.:0.0000000   1st Qu.:0.000e+00   1st Qu.: 6525640   1st Qu.:0      
##  Median :0.0000000   Median :0.000e+00   Median :21686712   Median :0      
##  Mean   :0.0001462   Mean   :4.598e-05   Mean   :28163286   Mean   :0      
##  3rd Qu.:0.0000760   3rd Qu.:0.000e+00   3rd Qu.:45679477   3rd Qu.:0      
##  Max.   :0.0073210   Max.   :2.615e-03   Max.   :78503934   Max.   :0      
##                                                                            
##    campground            canal                cfo    clearing_unknown   
##  Min.   :0.000e+00   Min.   :0.0000000   Min.   :0   Min.   :0.000e+00  
##  1st Qu.:0.000e+00   1st Qu.:0.0000000   1st Qu.:0   1st Qu.:0.000e+00  
##  Median :0.000e+00   Median :0.0000000   Median :0   Median :4.240e-07  
##  Mean   :2.409e-06   Mean   :0.0002124   Mean   :0   Mean   :8.076e-04  
##  3rd Qu.:0.000e+00   3rd Qu.:0.0000000   3rd Qu.:0   3rd Qu.:1.124e-03  
##  Max.   :4.967e-04   Max.   :0.0076994   Max.   :0   Max.   :2.818e-02  
##                                                                         
##  clearing_wellpad_unconfirmed conventional_seismic country_residence 
##  Min.   :0.0000000            Min.   :0.000000     Min.   :0.000000  
##  1st Qu.:0.0000000            1st Qu.:0.001827     1st Qu.:0.000000  
##  Median :0.0000000            Median :0.003612     Median :0.000000  
##  Mean   :0.0001149            Mean   :0.004028     Mean   :0.000439  
##  3rd Qu.:0.0000000            3rd Qu.:0.005451     3rd Qu.:0.000000  
##  Max.   :0.0027152            Max.   :0.030028     Max.   :0.056385  
##                                                                      
##       crop         cultivation_abandoned     dugout         
##  Min.   :0.00000   Min.   :0.000000      Min.   :0.000e+00  
##  1st Qu.:0.00000   1st Qu.:0.000000      1st Qu.:0.000e+00  
##  Median :0.00000   Median :0.000000      Median :0.000e+00  
##  Mean   :0.02988   Mean   :0.001701      Mean   :2.309e-05  
##  3rd Qu.:0.00000   3rd Qu.:0.000000      3rd Qu.:0.000e+00  
##  Max.   :0.43283   Max.   :0.040084      Max.   :1.239e-03  
##                                                             
##  facility_other      facility_unknown    fruit_vegetables   golfcourse
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0        Min.   :0   
##  1st Qu.:0.0000000   1st Qu.:0.0000000   1st Qu.:0        1st Qu.:0   
##  Median :0.0000000   Median :0.0000000   Median :0        Median :0   
##  Mean   :0.0003137   Mean   :0.0000223   Mean   :0        Mean   :0   
##  3rd Qu.:0.0000000   3rd Qu.:0.0000000   3rd Qu.:0        3rd Qu.:0   
##  Max.   :0.0774805   Max.   :0.0064178   Max.   :0        Max.   :0   
##                                                                       
##    greenspace        grvl_sand_pit       harvest_area    
##  Min.   :0.000e+00   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000e+00   1st Qu.:0.000000   1st Qu.:0.02588  
##  Median :0.000e+00   Median :0.000000   Median :0.23866  
##  Mean   :1.424e-05   Mean   :0.001116   Mean   :0.23873  
##  3rd Qu.:0.000e+00   3rd Qu.:0.000000   3rd Qu.:0.37536  
##  Max.   :2.346e-03   Max.   :0.416663   Max.   :0.98631  
##                                                          
##  harvest_area_white_zone     lagoon             landfill low_impact_seismic 
##  Min.   :0.00000         Min.   :0.000e+00   Min.   :0   Min.   :0.000e+00  
##  1st Qu.:0.00000         1st Qu.:0.000e+00   1st Qu.:0   1st Qu.:0.000e+00  
##  Median :0.00000         Median :0.000e+00   Median :0   Median :0.000e+00  
##  Mean   :0.01302         Mean   :3.106e-05   Mean   :0   Mean   :1.828e-05  
##  3rd Qu.:0.00000         3rd Qu.:0.000e+00   3rd Qu.:0   3rd Qu.:0.000e+00  
##  Max.   :0.80503         Max.   :4.257e-03   Max.   :0   Max.   :6.059e-03  
##                                                                             
##       mill   mines_pitlake misc_oil_gas_facility oil_gas_plant open_pit_mine
##  Min.   :0   Min.   :0     Min.   :0.0000000     Min.   :0     Min.   :0    
##  1st Qu.:0   1st Qu.:0     1st Qu.:0.0000000     1st Qu.:0     1st Qu.:0    
##  Median :0   Median :0     Median :0.0000000     Median :0     Median :0    
##  Mean   :0   Mean   :0     Mean   :0.0013619     Mean   :0     Mean   :0    
##  3rd Qu.:0   3rd Qu.:0     3rd Qu.:0.0007224     3rd Qu.:0     3rd Qu.:0    
##  Max.   :0   Max.   :0     Max.   :0.1313891     Max.   :0     Max.   :0    
##                                                                             
##     pipeline         recreation          reservoir         residence_clearing 
##  Min.   :0.00000   Min.   :0.000e+00   Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:0.00000   1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:0.0000000  
##  Median :0.00450   Median :0.000e+00   Median :0.000e+00   Median :0.0000000  
##  Mean   :0.01031   Mean   :6.623e-05   Mean   :8.539e-05   Mean   :0.0001049  
##  3rd Qu.:0.01523   3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.:0.0000000  
##  Max.   :0.14867   Max.   :7.941e-03   Max.   :1.393e-02   Max.   :0.0132461  
##                                                                               
##  rlwy_mlt_track rlwy_sgl_track        rlwy_spur road_gravel_1l     
##  Min.   :0      Min.   :0.0000000   Min.   :0   Min.   :0.0000000  
##  1st Qu.:0      1st Qu.:0.0000000   1st Qu.:0   1st Qu.:0.0006477  
##  Median :0      Median :0.0000000   Median :0   Median :0.0043887  
##  Mean   :0      Mean   :0.0001036   Mean   :0   Mean   :0.0056608  
##  3rd Qu.:0      3rd Qu.:0.0000000   3rd Qu.:0   3rd Qu.:0.0088573  
##  Max.   :0      Max.   :0.0036376   Max.   :0   Max.   :0.0598752  
##                                                                    
##  road_gravel_2l      road_paved_1l       road_paved_2l road_paved_3l
##  Min.   :0.000e+00   Min.   :0.000e+00   Min.   :0     Min.   :0    
##  1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:0     1st Qu.:0    
##  Median :0.000e+00   Median :0.000e+00   Median :0     Median :0    
##  Mean   :3.886e-05   Mean   :5.918e-06   Mean   :0     Mean   :0    
##  3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.:0     3rd Qu.:0    
##  Max.   :1.820e-03   Max.   :6.158e-04   Max.   :0     Max.   :0    
##                                                                     
##  road_paved_4l road_paved_div road_paved_undiv_1l road_paved_undiv_2l
##  Min.   :0     Min.   :0      Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:0     1st Qu.:0      1st Qu.:0.000e+00   1st Qu.:0.0000000  
##  Median :0     Median :0      Median :0.000e+00   Median :0.0000000  
##  Mean   :0     Mean   :0      Mean   :7.538e-06   Mean   :0.0005671  
##  3rd Qu.:0     3rd Qu.:0      3rd Qu.:0.000e+00   3rd Qu.:0.0000000  
##  Max.   :0     Max.   :0      Max.   :1.051e-03   Max.   :0.0066563  
##                                                                      
##  road_unclassified   road_unimproved     road_unpaved_2l  road_winter
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0       Min.   :0   
##  1st Qu.:0.0000000   1st Qu.:0.0001997   1st Qu.:0       1st Qu.:0   
##  Median :0.0000000   Median :0.0009214   Median :0       Median :0   
##  Mean   :0.0001274   Mean   :0.0011036   Mean   :0       Mean   :0   
##  3rd Qu.:0.0000000   3rd Qu.:0.0014730   3rd Qu.:0       3rd Qu.:0   
##  Max.   :0.0145510   Max.   :0.0237365   Max.   :0       Max.   :0   
##                                                                      
##  rough_pasture         runway          rural_residence         sump          
##  Min.   :0.00000   Min.   :0.0000000   Min.   :0.000000   Min.   :0.000e+00  
##  1st Qu.:0.00000   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.000e+00  
##  Median :0.00000   Median :0.0000000   Median :0.000000   Median :0.000e+00  
##  Mean   :0.01066   Mean   :0.0001223   Mean   :0.001884   Mean   :4.982e-05  
##  3rd Qu.:0.00000   3rd Qu.:0.0000000   3rd Qu.:0.000000   3rd Qu.:0.000e+00  
##  Max.   :0.28616   Max.   :0.0123446   Max.   :0.091914   Max.   :3.232e-03  
##                                                                              
##  surrounding_veg      tame_pasture        trail           transfer_station
##  Min.   :0.0000000   Min.   :0.0000   Min.   :0.0000000   Min.   :0       
##  1st Qu.:0.0000000   1st Qu.:0.0000   1st Qu.:0.0003476   1st Qu.:0       
##  Median :0.0000000   Median :0.0000   Median :0.0009790   Median :0       
##  Mean   :0.0001282   Mean   :0.0146   Mean   :0.0012570   Mean   :0       
##  3rd Qu.:0.0000000   3rd Qu.:0.0000   3rd Qu.:0.0018804   3rd Qu.:0       
##  Max.   :0.0346612   Max.   :0.2991   Max.   :0.0118693   Max.   :0       
##                                                                           
##  transmission_line    truck_trail        urban_industrial   
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0.000e+00  
##  1st Qu.:0.0000000   1st Qu.:0.0001198   1st Qu.:0.000e+00  
##  Median :0.0000000   Median :0.0006074   Median :0.000e+00  
##  Mean   :0.0011787   Mean   :0.0011931   Mean   :2.583e-05  
##  3rd Qu.:0.0003164   3rd Qu.:0.0015813   3rd Qu.:0.000e+00  
##  Max.   :0.0460439   Max.   :0.0823490   Max.   :4.045e-03  
##                                                             
##  urban_residence     vegetated_edge_railways vegetated_edge_roads
##  Min.   :0.0000000   Min.   :0.0000000       Min.   :0.000000    
##  1st Qu.:0.0000000   1st Qu.:0.0000000       1st Qu.:0.003433    
##  Median :0.0000000   Median :0.0000000       Median :0.012251    
##  Mean   :0.0001453   Mean   :0.0001874       Mean   :0.013592    
##  3rd Qu.:0.0000000   3rd Qu.:0.0000000       3rd Qu.:0.020822    
##  Max.   :0.0191791   Max.   :0.0049635       Max.   :0.099551    
##                                                                  
##  well_cleared_not_confirmed well_cleared_not_drilled   well_aband       
##  Min.   :0                  Min.   :0                Min.   :0.0000000  
##  1st Qu.:0                  1st Qu.:0                1st Qu.:0.0001567  
##  Median :0                  Median :0                Median :0.0019988  
##  Mean   :0                  Mean   :0                Mean   :0.0031103  
##  3rd Qu.:0                  3rd Qu.:0                3rd Qu.:0.0045361  
##  Max.   :0                  Max.   :0                Max.   :0.0437908  
##                                                                         
##   well_bitumen   well_cased           well_gas            well_oil       
##  Min.   :0     Min.   :0.000e+00   Min.   :0.000e+00   Min.   :0.000000  
##  1st Qu.:0     1st Qu.:0.000e+00   1st Qu.:0.000e+00   1st Qu.:0.000000  
##  Median :0     Median :0.000e+00   Median :0.000e+00   Median :0.003253  
##  Mean   :0     Mean   :7.166e-05   Mean   :5.196e-05   Mean   :0.005327  
##  3rd Qu.:0     3rd Qu.:0.000e+00   3rd Qu.:0.000e+00   3rd Qu.:0.009214  
##  Max.   :0     Max.   :3.125e-03   Max.   :1.857e-03   Max.   :0.095784  
##                                                                          
##    well_other        well_unknown      110                120         
##  Min.   :0.000000   Min.   :0     Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000000   1st Qu.:0     1st Qu.:0.006635   1st Qu.:0.00000  
##  Median :0.000000   Median :0     Median :0.034291   Median :0.00000  
##  Mean   :0.001677   Mean   :0     Mean   :0.055123   Mean   :0.03587  
##  3rd Qu.:0.002377   3rd Qu.:0     3rd Qu.:0.068804   3rd Qu.:0.00000  
##  Max.   :0.032438   Max.   :0     Max.   :0.883334   Max.   :0.49000  
##                                                                       
##        20               210               220              230         
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.0000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.03179   1st Qu.:0.1463   1st Qu.:0.00000  
##  Median :0.00000   Median :0.23137   Median :0.3010   Median :0.01965  
##  Mean   :0.06146   Mean   :0.23902   Mean   :0.3502   Mean   :0.04277  
##  3rd Qu.:0.03622   3rd Qu.:0.38303   3rd Qu.:0.5250   3rd Qu.:0.06350  
##  Max.   :0.84113   Max.   :0.84699   Max.   :1.0000   Max.   :0.93137  
##                                                                        
##        33                  34                50               1940          
##  Min.   :0.000e+00   Min.   :0.00000   Min.   :0.00000   Min.   :0.0000000  
##  1st Qu.:0.000e+00   1st Qu.:0.01782   1st Qu.:0.04624   1st Qu.:0.0000000  
##  Median :0.000e+00   Median :0.05463   Median :0.10172   Median :0.0000000  
##  Mean   :4.182e-05   Mean   :0.05948   Mean   :0.15602   Mean   :0.0002878  
##  3rd Qu.:0.000e+00   3rd Qu.:0.08856   3rd Qu.:0.20080   3rd Qu.:0.0000000  
##  Max.   :3.641e-03   Max.   :0.45140   Max.   :0.93212   Max.   :0.0243877  
##                                                                             
##       1950               1960               1966        1967          
##  Min.   :0.000000   Min.   :0.000000   Min.   :0   Min.   :0.0000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0   1st Qu.:0.0000000  
##  Median :0.000000   Median :0.000000   Median :0   Median :0.0000000  
##  Mean   :0.005077   Mean   :0.004843   Mean   :0   Mean   :0.0001106  
##  3rd Qu.:0.000000   3rd Qu.:0.001549   3rd Qu.:0   3rd Qu.:0.0000000  
##  Max.   :0.891286   Max.   :0.125229   Max.   :0   Max.   :0.0150943  
##                                                                       
##       1968                1969                1970              1971  
##  Min.   :0.0000000   Min.   :0.000e+00   Min.   :0.00000   Min.   :0  
##  1st Qu.:0.0000000   1st Qu.:0.000e+00   1st Qu.:0.00000   1st Qu.:0  
##  Median :0.0000000   Median :0.000e+00   Median :0.00000   Median :0  
##  Mean   :0.0001209   Mean   :1.067e-05   Mean   :0.02033   Mean   :0  
##  3rd Qu.:0.0000000   3rd Qu.:0.000e+00   3rd Qu.:0.01921   3rd Qu.:0  
##  Max.   :0.0135532   Max.   :1.691e-03   Max.   :0.87957   Max.   :0  
##                                                                       
##       1972        1973        1974        1975                1976          
##  Min.   :0   Min.   :0   Min.   :0   Min.   :0.000e+00   Min.   :0.0000000  
##  1st Qu.:0   1st Qu.:0   1st Qu.:0   1st Qu.:0.000e+00   1st Qu.:0.0000000  
##  Median :0   Median :0   Median :0   Median :0.000e+00   Median :0.0000000  
##  Mean   :0   Mean   :0   Mean   :0   Mean   :5.914e-06   Mean   :0.0000021  
##  3rd Qu.:0   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.000e+00   3rd Qu.:0.0000000  
##  Max.   :0   Max.   :0   Max.   :0   Max.   :1.430e-03   Max.   :0.0007532  
##                                                                             
##       1977                1978        1979        1980               1981  
##  Min.   :0.000e+00   Min.   :0   Min.   :0   Min.   :0.000000   Min.   :0  
##  1st Qu.:0.000e+00   1st Qu.:0   1st Qu.:0   1st Qu.:0.000000   1st Qu.:0  
##  Median :0.000e+00   Median :0   Median :0   Median :0.000000   Median :0  
##  Mean   :3.272e-07   Mean   :0   Mean   :0   Mean   :0.017130   Mean   :0  
##  3rd Qu.:0.000e+00   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.003411   3rd Qu.:0  
##  Max.   :2.599e-04   Max.   :0   Max.   :0   Max.   :0.420122   Max.   :0  
##                                                                            
##       1982        1983               1984                1985         
##  Min.   :0   Min.   :0.000000   Min.   :0.000e+00   Min.   :0.000000  
##  1st Qu.:0   1st Qu.:0.000000   1st Qu.:0.000e+00   1st Qu.:0.000000  
##  Median :0   Median :0.000000   Median :0.000e+00   Median :0.000000  
##  Mean   :0   Mean   :0.000197   Mean   :5.556e-05   Mean   :0.001827  
##  3rd Qu.:0   3rd Qu.:0.000000   3rd Qu.:0.000e+00   3rd Qu.:0.000000  
##  Max.   :0   Max.   :0.011415   Max.   :7.692e-03   Max.   :0.087543  
##                                                                       
##       1986               1987                1988               1989         
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.0000000   Median :0.000000   Median :0.000000  
##  Mean   :0.007432   Mean   :0.0003432   Mean   :0.001416   Mean   :0.002745  
##  3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :0.197918   Max.   :0.0449292   Max.   :0.171834   Max.   :0.173129  
##                                                                              
##       1990              1991        1992        1993          
##  Min.   :0.00000   Min.   :0   Min.   :0   Min.   :0.0000000  
##  1st Qu.:0.00000   1st Qu.:0   1st Qu.:0   1st Qu.:0.0000000  
##  Median :0.00000   Median :0   Median :0   Median :0.0000000  
##  Mean   :0.02199   Mean   :0   Mean   :0   Mean   :0.0002048  
##  3rd Qu.:0.00939   3rd Qu.:0   3rd Qu.:0   3rd Qu.:0.0000000  
##  Max.   :0.84354   Max.   :0   Max.   :0   Max.   :0.0205565  
##                                                               
##       1994                1995                1996               1997         
##  Min.   :0.0000000   Min.   :0.000e+00   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.0000000   1st Qu.:0.000e+00   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.0000000   Median :0.000e+00   Median :0.000000   Median :0.000000  
##  Mean   :0.0007679   Mean   :6.971e-05   Mean   :0.007337   Mean   :0.001736  
##  3rd Qu.:0.0000000   3rd Qu.:0.000e+00   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :0.0779967   Max.   :6.484e-03   Max.   :0.788790   Max.   :0.126973  
##                                                                               
##       1998               1999                2000               2001          
##  Min.   :0.000000   Min.   :0.0000000   Min.   :0.000000   Min.   :0.000e+00  
##  1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.000e+00  
##  Median :0.000000   Median :0.0000000   Median :0.000000   Median :0.000e+00  
##  Mean   :0.001915   Mean   :0.0004213   Mean   :0.007223   Mean   :7.072e-05  
##  3rd Qu.:0.000000   3rd Qu.:0.0000000   3rd Qu.:0.000000   3rd Qu.:0.000e+00  
##  Max.   :0.108919   Max.   :0.0388934   Max.   :0.393858   Max.   :8.372e-03  
##                                                                               
##       2002        2003               2004                2005         
##  Min.   :0   Min.   :0.000000   Min.   :0.0000000   Min.   :0.000000  
##  1st Qu.:0   1st Qu.:0.000000   1st Qu.:0.0000000   1st Qu.:0.000000  
##  Median :0   Median :0.000000   Median :0.0000000   Median :0.000000  
##  Mean   :0   Mean   :0.008926   Mean   :0.0043836   Mean   :0.002526  
##  3rd Qu.:0   3rd Qu.:0.000000   3rd Qu.:0.0001052   3rd Qu.:0.000000  
##  Max.   :0   Max.   :0.280990   Max.   :0.0906410   Max.   :0.244374  
##                                                                       
##       2006              2007                2008              2009        
##  Min.   :0.00000   Min.   :0.0000000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.00000  
##  Median :0.00000   Median :0.0000000   Median :0.00000   Median :0.00000  
##  Mean   :0.01975   Mean   :0.0003268   Mean   :0.00844   Mean   :0.01501  
##  3rd Qu.:0.01886   3rd Qu.:0.0000000   3rd Qu.:0.00000   3rd Qu.:0.01630  
##  Max.   :0.42386   Max.   :0.0326652   Max.   :0.49764   Max.   :0.37049  
##                                                                           
##       2010               2011               2012               2013         
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.000000   Median :0.000000  
##  Mean   :0.009539   Mean   :0.008219   Mean   :0.002717   Mean   :0.004286  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.000000  
##  Max.   :0.478107   Max.   :0.237154   Max.   :0.103264   Max.   :0.289583  
##                                                                             
##       2014                2015                2016               2017        
##  Min.   :0.000e+00   Min.   :0.0000000   Min.   :0.000000   Min.   :0.00000  
##  1st Qu.:0.000e+00   1st Qu.:0.0000000   1st Qu.:0.000000   1st Qu.:0.00000  
##  Median :0.000e+00   Median :0.0000000   Median :0.000000   Median :0.00000  
##  Mean   :8.902e-05   Mean   :0.0130128   Mean   :0.000608   Mean   :0.01409  
##  3rd Qu.:0.000e+00   3rd Qu.:0.0006807   3rd Qu.:0.000000   3rd Qu.:0.01322  
##  Max.   :4.485e-03   Max.   :0.4669166   Max.   :0.037603   Max.   :0.19362  
##                                                                              
##       2018               2019               2020              2021         
##  Min.   :0.000000   Min.   :0.000000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.000000   1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.000000  
##  Median :0.000000   Median :0.000000   Median :0.00000   Median :0.000000  
##  Mean   :0.003066   Mean   :0.006002   Mean   :0.00565   Mean   :0.008415  
##  3rd Qu.:0.000000   3rd Qu.:0.000000   3rd Qu.:0.00000   3rd Qu.:0.000000  
##  Max.   :0.359185   Max.   :0.186836   Max.   :0.11843   Max.   :0.459053  
##                                                                            
##  buffer_area.y       feature_area
##  Min.   :  196260   Min.   :0    
##  1st Qu.: 6525640   1st Qu.:0    
##  Median :21686712   Median :0    
##  Mean   :28163286   Mean   :0    
##  3rd Qu.:45679477   3rd Qu.:0    
##  Max.   :78503934   Max.   :0    
## 

It’s missing the columns for site and array from the reference data, but when we merge with the detections data it will get added because the sites will match up and there are no issues with duplicate sites.

One last thing, there’s a duplicate column that we didn’t use to join the data because it repeats for each site, buffer_area which is in the harvest and HFI data sets, we won’t need if for analyses so let’s remove now to clean it up

covariates_all <- covariates_all %>% 
  
  select(!contains('buffer_area'))

I opened the data in my Rstudio viewer window to double check this worked

Finish covariates data

Save data

Let’s also save this for future use

# save joined data 
write_csv(covariates_all,
          'data/processed/srfn_covariates.csv')

Remove messy data

Now that we’ve merged, cleaned, and reformatted the data we don’t need the list file or messy merged data anymore. Let’s remove these from the environment so we don’t accidentally use them.

rm(srfn_covariate_data,
   srfn_covariate_data_fixed)

Data formatting

There are too many covariates to include in the models individually and many of them describe similar HFI features.

The covariate_table and the README file in this repository include descriptions of each feature from the ABMI human footprints wall to wall data download website for Year 2021; which can also be found in the relevant_literature folder of this repository (HFI_2021_v1_0_Metadata_Final.pdf).

Group covaraites

As we prepare to lump the covariates together, we may need to reference the column names. Let’s print that now so we have it fresh in the console.

names(covariates_all)
##   [1] "site_number"                  "buff_dist"                   
##   [3] "airp_runway"                  "borrowpit_dry"               
##   [5] "borrowpit_wet"                "borrowpits"                  
##   [7] "camp_industrial"              "campground"                  
##   [9] "canal"                        "cfo"                         
##  [11] "clearing_unknown"             "clearing_wellpad_unconfirmed"
##  [13] "conventional_seismic"         "country_residence"           
##  [15] "crop"                         "cultivation_abandoned"       
##  [17] "dugout"                       "facility_other"              
##  [19] "facility_unknown"             "fruit_vegetables"            
##  [21] "golfcourse"                   "greenspace"                  
##  [23] "grvl_sand_pit"                "harvest_area"                
##  [25] "harvest_area_white_zone"      "lagoon"                      
##  [27] "landfill"                     "low_impact_seismic"          
##  [29] "mill"                         "mines_pitlake"               
##  [31] "misc_oil_gas_facility"        "oil_gas_plant"               
##  [33] "open_pit_mine"                "pipeline"                    
##  [35] "recreation"                   "reservoir"                   
##  [37] "residence_clearing"           "rlwy_mlt_track"              
##  [39] "rlwy_sgl_track"               "rlwy_spur"                   
##  [41] "road_gravel_1l"               "road_gravel_2l"              
##  [43] "road_paved_1l"                "road_paved_2l"               
##  [45] "road_paved_3l"                "road_paved_4l"               
##  [47] "road_paved_div"               "road_paved_undiv_1l"         
##  [49] "road_paved_undiv_2l"          "road_unclassified"           
##  [51] "road_unimproved"              "road_unpaved_2l"             
##  [53] "road_winter"                  "rough_pasture"               
##  [55] "runway"                       "rural_residence"             
##  [57] "sump"                         "surrounding_veg"             
##  [59] "tame_pasture"                 "trail"                       
##  [61] "transfer_station"             "transmission_line"           
##  [63] "truck_trail"                  "urban_industrial"            
##  [65] "urban_residence"              "vegetated_edge_railways"     
##  [67] "vegetated_edge_roads"         "well_cleared_not_confirmed"  
##  [69] "well_cleared_not_drilled"     "well_aband"                  
##  [71] "well_bitumen"                 "well_cased"                  
##  [73] "well_gas"                     "well_oil"                    
##  [75] "well_other"                   "well_unknown"                
##  [77] "110"                          "120"                         
##  [79] "20"                           "210"                         
##  [81] "220"                          "230"                         
##  [83] "33"                           "34"                          
##  [85] "50"                           "1940"                        
##  [87] "1950"                         "1960"                        
##  [89] "1966"                         "1967"                        
##  [91] "1968"                         "1969"                        
##  [93] "1970"                         "1971"                        
##  [95] "1972"                         "1973"                        
##  [97] "1974"                         "1975"                        
##  [99] "1976"                         "1977"                        
## [101] "1978"                         "1979"                        
## [103] "1980"                         "1981"                        
## [105] "1982"                         "1983"                        
## [107] "1984"                         "1985"                        
## [109] "1986"                         "1987"                        
## [111] "1988"                         "1989"                        
## [113] "1990"                         "1991"                        
## [115] "1992"                         "1993"                        
## [117] "1994"                         "1995"                        
## [119] "1996"                         "1997"                        
## [121] "1998"                         "1999"                        
## [123] "2000"                         "2001"                        
## [125] "2002"                         "2003"                        
## [127] "2004"                         "2005"                        
## [129] "2006"                         "2007"                        
## [131] "2008"                         "2009"                        
## [133] "2010"                         "2011"                        
## [135] "2012"                         "2013"                        
## [137] "2014"                         "2015"                        
## [139] "2016"                         "2017"                        
## [141] "2018"                         "2019"                        
## [143] "2020"                         "2021"                        
## [145] "feature_area"

Quick note to check with Emerald on, none of the ris features that came up in the second extraction of the OSM data are present here

Now we will use the mutate() function with some tidyverse trickery (i.e., nesting across() and contains() in rowsums()) to sum across each observation (row) by searching for various character strings. If there isn’t a common character string for multiple variables we want to sum then we provide each one individually. We can also combine these methods (e.g., with ‘facilities’ [see code]).

hfi_covariates_grouped <- covariates_all %>% 
  
  # rename 'vegetated_edge_roads so that we can use road as keyword to group roads without including this feature
  rename('vegetated_edge_rds' = vegetated_edge_roads) %>% 
  
  # within the mutate function create new column names for the grouped variables
  mutate(
    # borrowpits
    borrowpits = rowSums(across(contains('borrowpit'))) + # here we use rowsums with across() and contains() to sum acrross each row any values for columns that contain the keyword above. Be careful when using that there aren't any variables that match the string (keyword) provided that you don't want to include!
      
      dugout +
      lagoon +
      sump,
    
    
    # non-harvest clearings
    clearings = rowSums(across(contains('clearing'))) +
      runway,
    
    # cultivations
    cultivation = crop + 
      cultivation_abandoned +
      fruit_vegetables +
      rough_pasture +
      tame_pasture,
    
    # harvest areas
    harvest = rowSums(across(contains('harvest'))),
    
    # industrial facilities
    facilities = rowSums(across(contains('facility'))) +
      rowSums(across(contains('plant'))) +
      camp_industrial +
      mill +
      urban_industrial,
    
    # mine areas
    mines = rowSums(across(contains('mine'))) +
      rowSums(across(contains('tailing'))) +
      grvl_sand_pit,
    
    # railways
    railways = rowSums(across(contains('rlwy'))),
    
    # reclaimed areas
    reclaimed = rowSums(across(contains('reclaimed'))),
    
    # recreation areas
    recreation = campground +
      golfcourse +
      greenspace +
      recreation,
    
    # residential areas (can't use residence as keyword because 'residence_clearing' is in clearing unless we rearrange groupings or rename that one)
    residential = country_residence +
      rural_residence +
      urban_residence,
    
    # roads (we renamed 'vegetated_edge_roads' above to 'vegetated_edge_rds' so we can use roads as keyword here which saves a bunch of coding as there are many many road variables)
    roads = rowSums(across(contains('road'))) +
      airp_runway +
      transfer_station,
    
    # seismic lines
    seismic_lines = conventional_seismic,
    
    # 3D sesimic lines (put the 3D at the end though to make R happy)
    seismic_lines_3D = low_impact_seismic,
    
    # transmission lines
    transmission_lines = rowSums(across(contains('transmission'))),
    
    # trails
    trails = rowSums(across(contains('trail'))),
    
    # vegetated edges
    veg_edges = rowSums(across(contains('vegetated'))) +
      surrounding_veg,
    
    # man-made water features
    water = canal +
      reservoir,
    
    # well sites (this probably includes 'clearing_wellpad' need to check)
    wells = rowSums(across(contains('well'))),
    
    # we will group harvest into two 'bins' years 2000 + and pre 2000, the below code only works if the columns are ordered numerically and no columns of non-harvest data included between the necessary columns
    harvest_pre2000 = rowSums(across(`1940`:`1999`)),
       harvest_2000 = rowSums(across(`2000`:`2021`)),
    
    # remove columns that were used to create new columns to tidy the data frame
         .keep = 'unused') %>% 
  
  # now lets rename the landcover types which are currently just numbers and that isn't super informative
   # rename landcover classes
  rename(
    lc_grassland = '110',
    lc_coniferous = '210',
    lc_broadleaf = '220',
    lc_mixed = '230',
    lc_developed = '34',
    lc_shrub = '50',
    lc_water = '20',
    lc_bareground = '33',
    lc_agriculture = '120') %>% 
  
  # reorder alphabetically except site_number and buff_dist
  select(order(colnames(.))) %>% 
  
  # we want to move the columns that aren't HFI features or landcover to the front
  relocate(.,
           c(site_number,
             buff_dist)) %>% 
  
  # reorder variables so the veg data is after all the HFI data
  relocate(starts_with('lc_'),
           .after = wells)

# see what's left
names(hfi_covariates_grouped)
##  [1] "site_number"        "buff_dist"          "borrowpits"        
##  [4] "cfo"                "clearings"          "cultivation"       
##  [7] "facilities"         "feature_area"       "harvest"           
## [10] "harvest_2000"       "harvest_pre2000"    "landfill"          
## [13] "mines"              "pipeline"           "railways"          
## [16] "reclaimed"          "recreation"         "residential"       
## [19] "roads"              "seismic_lines"      "seismic_lines_3D"  
## [22] "trails"             "transmission_lines" "veg_edges"         
## [25] "water"              "wells"              "lc_agriculture"    
## [28] "lc_bareground"      "lc_broadleaf"       "lc_coniferous"     
## [31] "lc_developed"       "lc_grassland"       "lc_mixed"          
## [34] "lc_shrub"           "lc_water"
# check the structure of new data
str(hfi_covariates_grouped)
## tibble [1,200 × 35] (S3: tbl_df/tbl/data.frame)
##  $ site_number       : Factor w/ 60 levels "1","2","4","6",..: 1 2 3 4 5 6 7 8 9 10 ...
##  $ buff_dist         : int [1:1200] 250 250 250 250 250 250 250 250 250 250 ...
##  $ borrowpits        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ cfo               : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ clearings         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ cultivation       : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ facilities        : num [1:1200] 0 0.131 0 0 0 ...
##  $ feature_area      : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ harvest           : num [1:1200] 0.432 0.342 0 0.388 0.424 ...
##  $ harvest_2000      : num [1:1200] 0.355 0 0 0.179 0.424 ...
##  $ harvest_pre2000   : num [1:1200] 0.0763 0.3418 0 0.209 0 ...
##  $ landfill          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ mines             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ pipeline          : num [1:1200] 0 0.148 0.0148 0 0 ...
##  $ railways          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ reclaimed         : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ recreation        : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ residential       : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ roads             : num [1:1200] 0.00 5.99e-02 7.05e-03 7.11e-06 6.75e-03 ...
##  $ seismic_lines     : num [1:1200] 0.00 5.41e-05 0.00 0.00 0.00 ...
##  $ seismic_lines_3D  : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ trails            : num [1:1200] 0 0 0.011 0 0 ...
##  $ transmission_lines: num [1:1200] 0 0 0 0 0 ...
##  $ veg_edges         : num [1:1200] 0 0.09955 0.0129 0.00112 0.01425 ...
##  $ water             : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ wells             : num [1:1200] 0 0 0.0183 0.0318 0.0332 ...
##  $ lc_agriculture    : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ lc_bareground     : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ lc_broadleaf      : num [1:1200] 0 0.18 0 0 0 ...
##  $ lc_coniferous     : num [1:1200] 0.847 0 0.743 0.442 0.284 ...
##  $ lc_developed      : num [1:1200] 0 0.4514 0.0716 0.00837 0.04522 ...
##  $ lc_grassland      : num [1:1200] 0 0.3608 0.0618 0 0 ...
##  $ lc_mixed          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
##  $ lc_shrub          : num [1:1200] 0.15301 0.00776 0.12401 0.54941 0.6703 ...
##  $ lc_water          : num [1:1200] 0 0 0 0 0 0 0 0 0 0 ...
# check summary of new data
summary(hfi_covariates_grouped)
##   site_number     buff_dist      borrowpits             cfo   
##  1      :  20   Min.   : 250   Min.   :0.0000000   Min.   :0  
##  2      :  20   1st Qu.:1438   1st Qu.:0.0000000   1st Qu.:0  
##  4      :  20   Median :2625   Median :0.0002555   Median :0  
##  6      :  20   Mean   :2625   Mean   :0.0007907   Mean   :0  
##  10     :  20   3rd Qu.:3812   3rd Qu.:0.0009892   3rd Qu.:0  
##  12     :  20   Max.   :5000   Max.   :0.0296372   Max.   :0  
##  (Other):1080                                                 
##    clearings          cultivation        facilities        feature_area
##  Min.   :0.0000000   Min.   :0.00000   Min.   :0.000000   Min.   :0    
##  1st Qu.:0.0000000   1st Qu.:0.00000   1st Qu.:0.000000   1st Qu.:0    
##  Median :0.0001076   Median :0.00000   Median :0.000000   Median :0    
##  Mean   :0.0011496   Mean   :0.05684   Mean   :0.001724   Mean   :0    
##  3rd Qu.:0.0016142   3rd Qu.:0.00000   3rd Qu.:0.001291   3rd Qu.:0    
##  Max.   :0.0281760   Max.   :0.62457   Max.   :0.131389   Max.   :0    
##                                                                        
##     harvest        harvest_2000      harvest_pre2000      landfill
##  Min.   :0.0000   Min.   :0.000000   Min.   :0.00000   Min.   :0  
##  1st Qu.:0.0879   1st Qu.:0.008954   1st Qu.:0.00000   1st Qu.:0  
##  Median :0.2466   Median :0.133834   Median :0.05305   Median :0  
##  Mean   :0.2517   Mean   :0.142348   Mean   :0.09638   Mean   :0  
##  3rd Qu.:0.3814   3rd Qu.:0.221274   3rd Qu.:0.14270   3rd Qu.:0  
##  Max.   :0.9863   Max.   :0.856826   Max.   :0.98631   Max.   :0  
##                                                                   
##      mines             pipeline          railways           reclaimed
##  Min.   :0.000000   Min.   :0.00000   Min.   :0.0000000   Min.   :0  
##  1st Qu.:0.000000   1st Qu.:0.00000   1st Qu.:0.0000000   1st Qu.:0  
##  Median :0.000000   Median :0.00450   Median :0.0000000   Median :0  
##  Mean   :0.001116   Mean   :0.01031   Mean   :0.0001036   Mean   :0  
##  3rd Qu.:0.000000   3rd Qu.:0.01523   3rd Qu.:0.0000000   3rd Qu.:0  
##  Max.   :0.416663   Max.   :0.14867   Max.   :0.0036376   Max.   :0  
##                                                                      
##    recreation         residential           roads          seismic_lines     
##  Min.   :0.000e+00   Min.   :0.000000   Min.   :0.000000   Min.   :0.000000  
##  1st Qu.:0.000e+00   1st Qu.:0.000000   1st Qu.:0.002420   1st Qu.:0.001827  
##  Median :0.000e+00   Median :0.000000   Median :0.007065   Median :0.003612  
##  Mean   :8.288e-05   Mean   :0.002469   Mean   :0.007511   Mean   :0.004028  
##  3rd Qu.:0.000e+00   3rd Qu.:0.000000   3rd Qu.:0.011097   3rd Qu.:0.005451  
##  Max.   :8.322e-03   Max.   :0.091914   Max.   :0.059875   Max.   :0.030028  
##                                                                              
##  seismic_lines_3D        trails         transmission_lines    veg_edges       
##  Min.   :0.000e+00   Min.   :0.000000   Min.   :0.0000000   Min.   :0.000000  
##  1st Qu.:0.000e+00   1st Qu.:0.001088   1st Qu.:0.0000000   1st Qu.:0.003484  
##  Median :0.000e+00   Median :0.002230   Median :0.0000000   Median :0.012634  
##  Mean   :1.828e-05   Mean   :0.002450   Mean   :0.0011787   Mean   :0.013908  
##  3rd Qu.:0.000e+00   3rd Qu.:0.003230   3rd Qu.:0.0003164   3rd Qu.:0.021155  
##  Max.   :6.059e-03   Max.   :0.082349   Max.   :0.0460439   Max.   :0.099551  
##                                                                               
##      water               wells           lc_agriculture    lc_bareground      
##  Min.   :0.0000000   Min.   :0.0000000   Min.   :0.00000   Min.   :0.000e+00  
##  1st Qu.:0.0000000   1st Qu.:0.0008336   1st Qu.:0.00000   1st Qu.:0.000e+00  
##  Median :0.0000000   Median :0.0089454   Median :0.00000   Median :0.000e+00  
##  Mean   :0.0002978   Mean   :0.0103535   Mean   :0.03587   Mean   :4.182e-05  
##  3rd Qu.:0.0000000   3rd Qu.:0.0175866   3rd Qu.:0.00000   3rd Qu.:0.000e+00  
##  Max.   :0.0139309   Max.   :0.0957837   Max.   :0.49000   Max.   :3.641e-03  
##                                                                               
##   lc_broadleaf    lc_coniferous      lc_developed      lc_grassland     
##  Min.   :0.0000   Min.   :0.00000   Min.   :0.00000   Min.   :0.000000  
##  1st Qu.:0.1463   1st Qu.:0.03179   1st Qu.:0.01782   1st Qu.:0.006635  
##  Median :0.3010   Median :0.23137   Median :0.05463   Median :0.034291  
##  Mean   :0.3502   Mean   :0.23902   Mean   :0.05948   Mean   :0.055123  
##  3rd Qu.:0.5250   3rd Qu.:0.38303   3rd Qu.:0.08856   3rd Qu.:0.068804  
##  Max.   :1.0000   Max.   :0.84699   Max.   :0.45140   Max.   :0.883334  
##                                                                         
##     lc_mixed          lc_shrub          lc_water      
##  Min.   :0.00000   Min.   :0.00000   Min.   :0.00000  
##  1st Qu.:0.00000   1st Qu.:0.04624   1st Qu.:0.00000  
##  Median :0.01965   Median :0.10172   Median :0.00000  
##  Mean   :0.04277   Mean   :0.15602   Mean   :0.06146  
##  3rd Qu.:0.06350   3rd Qu.:0.20080   3rd Qu.:0.03622  
##  Max.   :0.93137   Max.   :0.93212   Max.   :0.84113  
## 

Okay this gives us a smaller data set to work with but I think we can clean it up further based on the summaries here there are several features we don’t have a lot of data for, we can remove any with all zeros here and check the others visually with some histograms of the data

hfi_covariates_grouped <- hfi_covariates_grouped %>%
  select(where(~ !all(. == 0)))

Grouped histograms

Let’s look at the histograms again and see if we need to remove any features or feature groups without enough data; I’m not worrying about the years of harevst data yet

# Define the starting column and get all column names from that point
start_col <- 'borrowpits'
columns_to_plot <- names(hfi_covariates_grouped)[which(names(hfi_covariates_grouped) == start_col):ncol(hfi_covariates_grouped)]

# Loop over the selected columns and create histograms
for (col in columns_to_plot) {
  hist(hfi_covariates_grouped[[col]], main = col, xlab = col)
}

> IMO we don’t have enough variation in data to use the following features/feature groups

  • borrowpits
  • clearings
  • Cultivation ?
  • facilities
  • mines
  • railways
  • Recreation
  • Residential
  • seismic_lines_3d
  • trails
  • transmission_lines
  • Water (industrial sources)
  • agriculutre?
  • bareground
  • lc_water?

Also, there’s not a lot of data for the following features, which are similar and of interest to OSM, so in the past they’ve been grouped together and we will here as well

  • Borrowpits
  • Facilities
  • Mines

For this analysis we will also combine these

Group covariates further

So let’s modify this data and remove those features for now this step will need to be changed each year likely

hfi_covariates_grouped_2 <- hfi_covariates_grouped %>% 
  
  # create column industrial
  mutate(
    industrial = borrowpits +
    clearings +
    facilities +
    mines,
    
    # remove columns we used to make this variable
    .keep = 'unused') %>% 
  
  # remove other features we don't need
  select(!c(cultivation,
            recreation,
            residential,
            seismic_lines_3D,
            trails,
            transmission_lines,
            water,
            railways,
            lc_bareground,
            lc_water)) %>% 
  
  # order again
  # reorder alphabetically except site_number and buff_dist
  select(order(colnames(.))) %>% 
  
  # we want to move the columns that aren't HFI features or landcover to the front
  relocate(.,
           c(site_number,
             buff_dist)) %>% 
  
  # reorder variables so the veg data is after all the HFI data
  relocate(starts_with('lc_'),
           .after = wells)
  
 

# check that it worked
names(hfi_covariates_grouped_2)
##  [1] "site_number"     "buff_dist"       "harvest"         "harvest_2000"   
##  [5] "harvest_pre2000" "industrial"      "pipeline"        "roads"          
##  [9] "seismic_lines"   "veg_edges"       "wells"           "lc_agriculture" 
## [13] "lc_broadleaf"    "lc_coniferous"   "lc_developed"    "lc_grassland"   
## [17] "lc_mixed"        "lc_shrub"

Let’s look at the histograms again

# Define the starting column and get all column names from that point
start_col <- 'harvest'
columns_to_plot <- names(hfi_covariates_grouped_2)[which(names(hfi_covariates_grouped_2) == start_col):ncol(hfi_covariates_grouped_2)]

# Loop over the selected columns and create histograms
for (col in columns_to_plot) {
  hist(hfi_covariates_grouped_2[[col]], main = col, xlab = col)
}

  • definitely need to drop industrial
  • possibly agriculture?
hfi_covariates_grouped_2 <- hfi_covariates_grouped_2 %>% 
  
  select(!c(industrial))

Remove messy data

Let’s remove the data frames we no longer need.

rm(covariates_all,
   covariates_fixed,
   covariates_grouped)
## Warning in rm(covariates_all, covariates_fixed, covariates_grouped): object
## 'covariates_fixed' not found
## Warning in rm(covariates_all, covariates_fixed, covariates_grouped): object
## 'covariates_grouped' not found

Add full site name

Now we need to add a column with the full site name from the reference data so this data can easily be joined with the detection data later

Import ref data

First let’s read in the reference data

sites <- read_csv('data/raw/reference.csv',
                  
                  # specify column types
                  col_types = cols(.default = col_factor())) %>% 

# I don't like the original column names I think they are confusing so I'm quick going to change them here
rename(site_number = site,
       site = real_site)

Now let’s join them and be done with this!

covariates_final <- hfi_covariates_grouped_2 %>% 
  
  # join 
  left_join(sites,
            by = 'site_number') %>% 
  
  # relocate site to front
  relocate(site,
           .after = site_number)

Save grouped data

Let’s save this data now that it’s all formatted and grouped.

write_csv(covariates_final,
          'data/processed/srfn_covariates_grouped.csv')

We are done with this script for now, we have a nice clean data set with the HFI and harvest covariates grouped how we could use them in an analysis and the VEG covariates renamed so we don’t have to memorize or lookup what the numbers mean